On Fri, Oct 20, 2023 at 8:02 PM Damian McGuckin <damianm@esi.com.au> wrote:

What modern CPUs have a penalty for double precision floating point
arithmetic on scalars compared to single precision once they are in a
register, i.e. ignoring memory fetch issues.

I have Agner Fog's excellent document for X86-64 which basically says that 32
bit and 64 bit operations for scalars take the same amount of time.

I am looking for the same type of information for ARM and RISC-V. I found the
data for 32-bit in the online documentation. But nothing bout 64 bit.

I cannot find anything on this topic on RISC-V or POWER10.

Maybe I am not searching on the right terms.

Note that I am after the raw performance, not say the relative performance
of say the MUSL sin() routine compared with the MUSL sinf().

Have you looked at the scheduler description for ARM, RISC-V and POWER in GCC or LLVM?

David