From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: [PATCH] inline llsc atomics when compiling for sh4a
Date: Mon, 18 May 2015 20:30:45 -0400 [thread overview]
Message-ID: <20150519003045.GC17573@brightrain.aerifal.cx> (raw)
In-Reply-To: <20150518225617.GA1905@duality.lan>
On Mon, May 18, 2015 at 05:56:18PM -0500, Bobby Bingham wrote:
> On Sun, May 17, 2015 at 10:34:02PM -0400, Rich Felker wrote:
> > On Sun, May 17, 2015 at 01:55:16PM -0500, Bobby Bingham wrote:
> > > If we're building for sh4a, the compiler is already free to use
> > > instructions only available on sh4a, so we can do the same and inline the
> > > llsc atomics. If we're building for an older processor, we still do the
> > > same runtime atomics selection as before.
> >
> > Thanks! I think it's ok for commit as-is, but based on re-reading this
> > code I have some ideas for improving it that are orthogonal to this
> > change. See comments inline:
>
> Would you prefer I resend this patch to remove the LLSC_* macros at the
> same time, or another patch to remove them separately?
No, let's do that separately. I like keeping independent changes
separate in commits. I've tested the current patch already and it
seems to be fine. If we do the second LLSC_* removal patch it
shouldn't affect the generated binaries, which is easy to verify.
>
> >
> > > ---
> > > arch/sh/atomic.h | 83 +++++++++++++++++++++++++++++++
> > > arch/sh/src/atomic.c | 135 +++++++++++++++++----------------------------------
> > > 2 files changed, 128 insertions(+), 90 deletions(-)
> > >
> > > diff --git a/arch/sh/atomic.h b/arch/sh/atomic.h
> > > index a1d22e4..f2e6dac 100644
> > > --- a/arch/sh/atomic.h
> > > +++ b/arch/sh/atomic.h
> > > @@ -22,6 +22,88 @@ static inline int a_ctz_64(uint64_t x)
> > > return a_ctz_l(y);
> > > }
> > >
> > > +#define LLSC_CLOBBERS "r0", "t", "memory"
> > > +#define LLSC_START(mem) "synco\n" \
> > > + "0: movli.l @" mem ", r0\n"
> > > +#define LLSC_END(mem) \
> > > + "1: movco.l r0, @" mem "\n" \
> > > + " bf 0b\n" \
> > > + " synco\n"
> > > +
> > > +static inline int __sh_cas_llsc(volatile int *p, int t, int s)
> > > +{
> > > + int old;
> > > + __asm__ __volatile__(
> > > + LLSC_START("%1")
> > > + " mov r0, %0\n"
> > > + " cmp/eq %0, %2\n"
> > > + " bf 1f\n"
> > > + " mov %3, r0\n"
> > > + LLSC_END("%1")
> > > + : "=&r"(old) : "r"(p), "r"(t), "r"(s) : LLSC_CLOBBERS);
> > > + return old;
> > > +}
> >
> > The mov from r0 to %0 seems unnecessary here. Presumably it's because
> > you didn't have a constraint to force old to come out via r0. Could
> > you do the following?
>
> No, because the "mov %3, r0" a couple instructions down clobbers r0, and
> this is necessary because movco.l only accept r0 as the source operand.
Oh, I see. We lose the old value that the function needs to return. So
a version of cas that just returns success/failure rather than the old
value would be able to omit it, but musl's can't, right? I should keep
that in mind, since at some point, if I can determine that the old
value isn't important anywhere, I might consider changing the a_cas
API to just have success/failure as its result. This is mildly cheaper
on some other archs too, I think.
> > I've actually always wondered about the value of having the LLSC_*
> > macros. I usually prefer for the whole asm to be written out
> > explicitly and readable unless there's a compelling reason to wrap it
> > up in macros. Then it would look like:
>
> I think it made a bigger difference for the gusa version, so I mostly
> did it with llsc for consistency. And before this patch, with the gusa
> and llsc versions were side by side in the same function, it made it
> easier for me to verify both versions were doing the same thing as I
> wrote them.
>
> Now that the llsc version is moving, I'm less attached to the LLSC_*
> macros. I do think the gusa stuff is ugly and magical enough that I'd
> still prefer to keep it hidden away, if you don't object.
Yeah, I don't mind.
> > static inline int __sh_cas_llsc(volatile int *p, int t, int s)
> > {
> > register int old __asm__("r0");
> > __asm__ __volatile__(
> > " synco\n"
> > "0: movli.l @%1, r0\n"
> > " cmp/eq r0, %2\n"
> > " bf 1f\n"
> > " mov %3, r0\n"
> > "1: movco.l r0, @%1\n"
> > " bf 0b\n"
> > " synco\n"
> > : "=&r"(old) : "r"(p), "r"(t), "r"(s) : "t", "memory");
> > return old;
> > }
> >
> > and similar for other functions. Part of the motivation of not hiding
> > the outer logic in macros is that it might make it possible to
> > fold/simplify some special cases like I did above for CAS.
>
> I don't mind in principle, but I think the fact that movco.l requires
> its input be in r0 is going to mean there's not actually any
> simplification you can do.
I think you may be right.
> > Another idea is letting the compiler simplify, with something like the
> > following, which could actually be used cross-platform for all
> > llsc-type archs:
> >
> > static inline int __sh_cas_llsc(volatile int *p, int t, int s)
> > {
> > do old = llsc_start(p);
> > while (*p == t && !llsc_end(p, s));
> > return old;
> > }
> >
> > Here llsc_start and llsc_end would be inline functions using asm with
> > appropriate constraints. Unfortunately I don't see a way to model
> > using the value of the truth flag "t" as the output of the asm for
> > llsc_end, though. I suspect this would be a problem on a number of
> > other archs too; the asm would have to waste an instruction (or
> > several) converting the flag to an integer. Unless there's a solution
> > to that problem, it makes an approach like this less appealing.
>
> I agree this would be even nicer if we could make it work.
Yes. FWIW the above code has a bug. It should be:
static inline int __sh_cas_llsc(volatile int *p, int t, int s)
{
do old = llsc_start(p);
while (old == t && !llsc_end(p, s));
return old;
}
I think there would need to be a barrier before the return statement
too. Embedding the barrier in llsc_end would not work because (1) it
won't get executed on old!=t, and (2) it should be avoided when
llsc_end fails since llsc_start will do a barrier anyway, but if we
put the conditional inside llsc_end's asm then the condition would get
branched-on twice. Issue (2) may be solvable by using a C conditional
inside llsc_end, but that still leaves issue (1), so I think something
like this would be needed:
static inline int __sh_cas_llsc(volatile int *p, int t, int s)
{
do old = llsc_start(p);
while (old == t && !llsc_end(p, s));
a_barrier();
return old;
}
But that's suboptimal on archs where the 'sc' part of llsc has an
implicit barrier (microblaze and or1k, IIRC). So perhaps the ideal
general version would be:
static inline int __sh_cas_llsc(volatile int *p, int t, int s)
{
do old = llsc_start(p);
while (old == t ? !llsc_end(p, s) : (a_barrier(), 0));
return old;
}
with a barrier in the success path for llsc_end. Alternatively we
could aim to always do the sc:
static inline int __sh_cas_llsc(volatile int *p, int t, int s)
{
do old = llsc_start(p);
while (!llsc_end(p, old==t ? s : old));
return old;
}
This version is structurally analogous to the non-CAS atomics, but
perhaps more costly in the old!=t case.
Anyway at this point I don't see an efficient way do to the
conditional, so it's mostly a theoretical topic at this point.
> > For the GUSA stuff, do you really nexed the ODD/EVEN macros? I think
> > you could just add appropriate .align inside that would cause the
> > assembler to insert a nop if necessary.
>
> Should be possible. I'll work on it and send another patch.
Thanks!
Rich
next prev parent reply other threads:[~2015-05-19 0:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-17 18:55 Bobby Bingham
2015-05-18 2:34 ` Rich Felker
2015-05-18 4:28 ` Rich Felker
2015-05-18 22:56 ` Bobby Bingham
2015-05-19 0:30 ` Rich Felker [this message]
2015-05-19 2:12 ` Rich Felker
2015-05-19 4:52 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150519003045.GC17573@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).