From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7675 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] inline llsc atomics when compiling for sh4a Date: Sun, 17 May 2015 22:34:02 -0400 Message-ID: <20150518023402.GS17573@brightrain.aerifal.cx> References: <20150517185516.GA32020@duality.lan> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1431916458 22116 80.91.229.3 (18 May 2015 02:34:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 18 May 2015 02:34:18 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7687-gllmg-musl=m.gmane.org@lists.openwall.com Mon May 18 04:34:17 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YuAsj-00089C-Fh for gllmg-musl@m.gmane.org; Mon, 18 May 2015 04:34:17 +0200 Original-Received: (qmail 16340 invoked by uid 550); 18 May 2015 02:34:14 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 16319 invoked from network); 18 May 2015 02:34:14 -0000 Content-Disposition: inline In-Reply-To: <20150517185516.GA32020@duality.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:7675 Archived-At: On Sun, May 17, 2015 at 01:55:16PM -0500, Bobby Bingham wrote: > If we're building for sh4a, the compiler is already free to use > instructions only available on sh4a, so we can do the same and inline the > llsc atomics. If we're building for an older processor, we still do the > same runtime atomics selection as before. Thanks! I think it's ok for commit as-is, but based on re-reading this code I have some ideas for improving it that are orthogonal to this change. See comments inline: > --- > arch/sh/atomic.h | 83 +++++++++++++++++++++++++++++++ > arch/sh/src/atomic.c | 135 +++++++++++++++++---------------------------------- > 2 files changed, 128 insertions(+), 90 deletions(-) > > diff --git a/arch/sh/atomic.h b/arch/sh/atomic.h > index a1d22e4..f2e6dac 100644 > --- a/arch/sh/atomic.h > +++ b/arch/sh/atomic.h > @@ -22,6 +22,88 @@ static inline int a_ctz_64(uint64_t x) > return a_ctz_l(y); > } > > +#define LLSC_CLOBBERS "r0", "t", "memory" > +#define LLSC_START(mem) "synco\n" \ > + "0: movli.l @" mem ", r0\n" > +#define LLSC_END(mem) \ > + "1: movco.l r0, @" mem "\n" \ > + " bf 0b\n" \ > + " synco\n" > + > +static inline int __sh_cas_llsc(volatile int *p, int t, int s) > +{ > + int old; > + __asm__ __volatile__( > + LLSC_START("%1") > + " mov r0, %0\n" > + " cmp/eq %0, %2\n" > + " bf 1f\n" > + " mov %3, r0\n" > + LLSC_END("%1") > + : "=&r"(old) : "r"(p), "r"(t), "r"(s) : LLSC_CLOBBERS); > + return old; > +} The mov from r0 to %0 seems unnecessary here. Presumably it's because you didn't have a constraint to force old to come out via r0. Could you do the following? static inline int __sh_cas_llsc(volatile int *p, int t, int s) { register int old __asm__("r0"); __asm__ __volatile__( LLSC_START("%1") " cmp/eq r0, %2\n" " bf 1f\n" " mov %3, r0\n" LLSC_END("%1") : "=&r"(old) : "r"(p), "r"(t), "r"(s) : "t", "memory"); return old; } Note that I had to drop LLSC_CLOBBERS because "r0" was in it and you can't clobber an output register. I've actually always wondered about the value of having the LLSC_* macros. I usually prefer for the whole asm to be written out explicitly and readable unless there's a compelling reason to wrap it up in macros. Then it would look like: static inline int __sh_cas_llsc(volatile int *p, int t, int s) { register int old __asm__("r0"); __asm__ __volatile__( " synco\n" "0: movli.l @%1, r0\n" " cmp/eq r0, %2\n" " bf 1f\n" " mov %3, r0\n" "1: movco.l r0, @%1\n" " bf 0b\n" " synco\n" : "=&r"(old) : "r"(p), "r"(t), "r"(s) : "t", "memory"); return old; } and similar for other functions. Part of the motivation of not hiding the outer logic in macros is that it might make it possible to fold/simplify some special cases like I did above for CAS. Another idea is letting the compiler simplify, with something like the following, which could actually be used cross-platform for all llsc-type archs: static inline int __sh_cas_llsc(volatile int *p, int t, int s) { do old = llsc_start(p); while (*p == t && !llsc_end(p, s)); return old; } Here llsc_start and llsc_end would be inline functions using asm with appropriate constraints. Unfortunately I don't see a way to model using the value of the truth flag "t" as the output of the asm for llsc_end, though. I suspect this would be a problem on a number of other archs too; the asm would have to waste an instruction (or several) converting the flag to an integer. Unless there's a solution to that problem, it makes an approach like this less appealing. > int __sh_cas(volatile int *p, int t, int s) > { > + if (__hwcap & CPU_HAS_LLSC) return __sh_cas_llsc(p, t, s); > + > int old; > - if (__hwcap & CPU_HAS_LLSC) { > - __asm__ __volatile__( > - LLSC_START("%1") > - " mov r0, %0\n" > - " cmp/eq %0, %2\n" > - " bf 1f\n" > - " mov %3, r0\n" > - LLSC_END("%1") > - : "=&r"(old) : "r"(p), "r"(t), "r"(s) : LLSC_CLOBBERS); > - } else { > - __asm__ __volatile__( > - GUSA_START_EVEN("%1", "%0") > - " cmp/eq %0, %2\n" > - " bf 1f\n" > - GUSA_END("%1", "%3") > - : "=&r"(old) : "r"(p), "r"(t), "r"(s) : GUSA_CLOBBERS, "t"); > - } > + __asm__ __volatile__( > + GUSA_START_EVEN("%1", "%0") > + " cmp/eq %0, %2\n" > + " bf 1f\n" > + GUSA_END("%1", "%3") > + : "=&r"(old) : "r"(p), "r"(t), "r"(s) : GUSA_CLOBBERS, "t"); > return old; > } For the GUSA stuff, do you really nexed the ODD/EVEN macros? I think you could just add appropriate .align inside that would cause the assembler to insert a nop if necessary. Rich