mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Markus Wichmann <nullplan@gmx.net>
Cc: musl@lists.openwall.com
Subject: Re: [musl] x86 fma with run-time switch?
Date: Fri, 15 Mar 2024 17:36:23 -0400	[thread overview]
Message-ID: <20240315213622.GG4163@brightrain.aerifal.cx> (raw)
In-Reply-To: <ZfSLN8KuN3AwhxV7@voyager>

On Fri, Mar 15, 2024 at 06:53:59PM +0100, Markus Wichmann wrote:
> Hi all,
> 
> in commit e9016138, Szabolcs wrote into the message that we really
> should be using the single-instruction versions if possible, and we
> should be switching at run time. I have an idea for how to do that
> without losing all of the history of the generic fma.c:
> 
> - Rename src/math/fma.c to src/math/fma-soft.h. Rename the fma function
>   inside to fma_soft and make it static (inline?).
> - Create a new src/math/fma.c that includes fma-soft.h and just calls
>   fma_soft().
> - In src/math/x86_64/fma.c: Unconditionally define fma_fma() and
>   fma_fma4() (which are the current assembler versions) and include
>   fma-soft.h. Create a dispatcher to figure out which version to call,
>   and call that from fma().

You're making it too complicated. Just

#define fma __soft_fma
#include "../fma.c"

or similar.

> Yeah, I know, the header file with stuff in it that takes memory is not
> exactly great, but I can't think of another way to define the generic
> version such that it is accessible to the arch-specific versions under a
> different name and linkage. The file must not be a .c file, or else it
> will confuse the build system.
> 
> Question I have right out the gate is whether this would be interesting
> to the group. Second question is whether it is better to be running
> cpuid every time fma() is called, or to use a function pointer? I am
> partial to the dispatcher pattern myself. In that case, the function
> pointer is initialized at load time to point to the dispatcher, which
> then selects the best implementation and updates the function pointer.
> The main function only unconditionally calls the function pointer.

My expectation was that you would just use __hwcap, whereby it would
be a hidden global access no more expensive than accessing a function
pointer for indirect branch, and likely cheaper to do local direct
branches based on testing bits of it, something like:

	if (__hwcap & WHATEVER) {
		__asm__(...);
		return ...;
	} else return __soft_fma(...);

However, it looks like x86_64 lacks usable __hwcap.

> With a bit of preprocessor magic, I can also ensure that if __FMA__ or
> __FMA4__ are set, the dispatcher is not included, and only the given
> function is called. Although that may incur a warning of an unused
> static function. I suppose that is a problem that can be fixed with more
> preprocessor magic.

We already do that.

> From my preliminary research, the fma3 and fma4 ISA extensions require
> no kernel support, so this will be the first time a CPUID call is
> needed. fma3 support is signalled with bit 12 of ECX in CPUID function
> 1. fma4 support is signalled with bit 16 of ECX in CPUID function
> 0x80000001 - on AMD CPUs. Intel has the bit reserved, so to be extra
> safe, the CPU vendor ought to be checked, too.
> 
> Doing the same for i386 requires also verifying kernel SSE support in
> hwcap (that implies CPUID support in the CPU, since the baseline 80486
> does not necessarily have that, but all CPUs with SSE have it) and also
> support for extended CPUID in case of fma4.

That seems a lot less likely to be worthwhile since it involves
shuffling data back and forth between x87 and sse registers, but maybe
it's still a big enough win to want to do it?

The logic for how you determine availability seems right.

> Since the CPUID challenges would be shared between fma and fmaf, I would
> like to put them into a new header file in src/include (maybe create
> src/include/x86_64? Or should it be added to arch/x86_64?)
> 
> So what are your thoughts on this?

I'm somewhat skeptical of what value there is to doing this
particularly for fma. There're probably a lot more places we don't do
any runtime-conditional optimized code that have higher returns
(memcpy etc. being the most obvious) and it seems likely that programs
that care about fma performance are themselves compiled with the right
ISA levels and using the compiler builtin, never calling the function
at all.

Regardless, I wonder if we should have the x86_64 startup code store
cpuid result somewhere we can use, so that we don't have to do nasty
atomic stuff determining it late, and could just branch on the bits of
a runtime-constant like we would with __hwcap. This would set the
stage for being able to do with more-impactful things like mem* too.
(That's a big project that involves designing the system for how archs
define the large-block-ops the generic C functions would use, so not
immediately applicable, but useful to be working towards it.)

Rich

  reply	other threads:[~2024-03-15 21:36 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-15 17:53 Markus Wichmann
2024-03-15 21:36 ` Rich Felker [this message]
2024-03-16  3:37   ` Markus Wichmann
2024-03-16  9:16     ` Markus Wichmann
2024-03-18 16:15     ` Markus Wichmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240315213622.GG4163@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=nullplan@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).