Re: [PATCH] x86: optimize fp_arch.h

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Szabolcs Nagy <nsz@port70.net>
To: musl@lists.openwall.com
Subject: Re: [PATCH] x86: optimize fp_arch.h
Date: Thu, 25 Apr 2019 10:53:54 +0200	[thread overview]
Message-ID: <20190425085353.GI26605@port70.net> (raw)
In-Reply-To: <20190425020108.GP23599@brightrain.aerifal.cx>

* Rich Felker <dalias@libc.org> [2019-04-24 22:01:08 -0400]:
> On Thu, Apr 25, 2019 at 01:51:06AM +0200, Szabolcs Nagy wrote:
> > tested on x86_64 and i386
> 
> > >From 5f97370ff3e94bea812ec123a31d7482965a3b1b Mon Sep 17 00:00:00 2001
> > From: Szabolcs Nagy <nsz@port70.net>
> > Date: Wed, 24 Apr 2019 23:29:05 +0000
> > Subject: [PATCH] x86: optimize fp_arch.h
> > 
> > Use fp register constraint instead of volatile store when sse2 math is
> > available, and use memory constraint when only x87 fpu is available.
> > ---
> >  arch/i386/fp_arch.h   | 31 +++++++++++++++++++++++++++++++
> >  arch/x32/fp_arch.h    | 25 +++++++++++++++++++++++++
> >  arch/x86_64/fp_arch.h | 25 +++++++++++++++++++++++++
> >  3 files changed, 81 insertions(+)
> >  create mode 100644 arch/i386/fp_arch.h
> >  create mode 100644 arch/x32/fp_arch.h
> >  create mode 100644 arch/x86_64/fp_arch.h
> > 
> > diff --git a/arch/i386/fp_arch.h b/arch/i386/fp_arch.h
> > new file mode 100644
> > index 00000000..b4019de2
> > --- /dev/null
> > +++ b/arch/i386/fp_arch.h
> > @@ -0,0 +1,31 @@
> > +#ifdef __SSE2_MATH__
> > +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+x"(x))
> > +#else
> > +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+m"(x))
> > +#endif
> 
> I guess for float and double you need the "m" constraint to ensure
> that a broken compiler doesn't skip dropping of precision (although I
> still wish we didn't bother with complexity to support that, and just
> relied on cast working correctly), but at least for long double
> couldn't we use an x87 register constraint to avoid the spill to
> memory?

i think fp_barrier does not have to drop excess precision:
it is supposed to be an identity op that is hidden from
the compiler e.g. to prevent const folding or hoisting,
but fp_force_eval is used to force side-effects that may only
happen if the excess precision is dropped.

i think modern gcc drops excess precision at arg passing
in standard mode, so "+m" is not needed, but makes the code
behave the same in non-standard mode too.

and yes the long double version could use "+t", maybe i should
add that (the patch saves about 400byte .text because of
volatile load/store overhead).

     prev parent reply	other threads:[~2019-04-25  8:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-24 23:51 Szabolcs Nagy
2019-04-25  2:01 ` Rich Felker
2019-04-25  8:53   ` Szabolcs Nagy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190425085353.GI26605@port70.net \
    --to=nsz@port70.net \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).