On Thu, 29 Aug 2024, Alex Rønne Petersen wrote:

> That aside, while the motivating issue doesn't (easily) reproduce with
> vanilla Clang, it's nonetheless still the case that Clang folds
> multiple expressions in `fma()` into `llvm.fmuladd.*` intrinsic calls.
> While this might work out in some cases, we've still basically lost at
> the LLVM IR level; we're at the mercy of the target backend in regards
> to whether it gets lowered to an actual FMA instruction or split back
> to the ~original FMUL + FADD. And this isn't even considering what
> other nonsense the optimizer pipeline might get up to before that.

Thank you for uncovering what was happening in LLVM! I agree there's
a backend bug, but the point is moot since disabling FMA contraction
globally is the way to go, as discussed in the longer sub-thread.

Alexander