On Thu, 29 Aug 2024, Alex Rønne Petersen wrote: > That aside, while the motivating issue doesn't (easily) reproduce with > vanilla Clang, it's nonetheless still the case that Clang folds > multiple expressions in `fma()` into `llvm.fmuladd.*` intrinsic calls. > While this might work out in some cases, we've still basically lost at > the LLVM IR level; we're at the mercy of the target backend in regards > to whether it gets lowered to an actual FMA instruction or split back > to the ~original FMUL + FADD. And this isn't even considering what > other nonsense the optimizer pipeline might get up to before that. Thank you for uncovering what was happening in LLVM! I agree there's a backend bug, but the point is moot since disabling FMA contraction globally is the way to go, as discussed in the longer sub-thread. Alexander