mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
@ 2024-08-28 15:28 Alex Rønne Petersen
  2024-08-28 15:53 ` Alexander Monakov
  2024-08-28 19:56 ` Alexander Monakov
  0 siblings, 2 replies; 11+ messages in thread
From: Alex Rønne Petersen @ 2024-08-28 15:28 UTC (permalink / raw)
  To: musl; +Cc: Alex Rønne Petersen

I've seen Clang do this for expressions in the fma() implementation itself,
which of course led to infinite recursion. This happened when targeting
arm-linux-musleabi with full soft float mode and -march=armv8-a. I imagine
it's possible for GCC to do similar silliness.

Work around this by passing -ffp-contract=off for Clang and -mno-fused-madd
for GCC. This matches what glibc's configure.ac does, FWIW.
---
 configure | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/configure b/configure
index bc9fbe48..7028793f 100755
--- a/configure
+++ b/configure
@@ -355,6 +355,15 @@ tryflag CFLAGS_C99FSE -fexcess-precision=standard \
 || { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }
 tryflag CFLAGS_C99FSE -frounding-math
 
+#
+# Prevent the compiler from turning a * b + c into an fma() call.
+# Clang at least has been known to do this in the implementation of
+# fma() itself when targeting arm-linux-musleabi and armv8-a, causing
+# infinite recursion.
+#
+tryflag CFLAGS_C99FSE -mno-fused-madd
+tryflag CFLAGS_C99FSE -ffp-contract=off
+
 #
 # Semantically we want to insist that our sources follow the
 # C rules for type-based aliasing, but most if not all real-world
-- 
2.40.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-28 15:28 [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c) Alex Rønne Petersen
@ 2024-08-28 15:53 ` Alexander Monakov
  2024-08-28 16:31   ` Alex Rønne Petersen
  2024-08-28 20:15   ` Rich Felker
  2024-08-28 19:56 ` Alexander Monakov
  1 sibling, 2 replies; 11+ messages in thread
From: Alexander Monakov @ 2024-08-28 15:53 UTC (permalink / raw)
  To: musl; +Cc: Alex Rønne Petersen

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]


On Wed, 28 Aug 2024, Alex Rønne Petersen wrote:

> I've seen Clang do this for expressions in the fma() implementation itself,
> which of course led to infinite recursion. This happened when targeting
> arm-linux-musleabi with full soft float mode and -march=armv8-a. I imagine
> it's possible for GCC to do similar silliness.

musl passes -std=c99 to the compiler, and in all GCC releases so far* that
disables FMA contraction (as opposed to -std=gnu99 or whichever -std=gnuXX
is enabled by default, where unrestricted contraction is implicitly enabled,
i.e. the non-standard and dangerous -ffp-contract=fast mode).

Clang respects #pragma STDC FP_CONTRACT OFF, so that is available as
a smaller hammer than disabling fma across the board. Breaking up
contractable expression in fma*.c will work too.

[*] maybe modulo bugs in old releases where the backend doesn't respect
-ffp-contract=off and which Glibc worked around with -mno-fused-madd.

Alexander

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-28 15:53 ` Alexander Monakov
@ 2024-08-28 16:31   ` Alex Rønne Petersen
  2024-08-28 20:15   ` Rich Felker
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Rønne Petersen @ 2024-08-28 16:31 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: musl

On Wed, Aug 28, 2024 at 5:53 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Wed, 28 Aug 2024, Alex Rønne Petersen wrote:
>
> > I've seen Clang do this for expressions in the fma() implementation itself,
> > which of course led to infinite recursion. This happened when targeting
> > arm-linux-musleabi with full soft float mode and -march=armv8-a. I imagine
> > it's possible for GCC to do similar silliness.
>
> musl passes -std=c99 to the compiler, and in all GCC releases so far* that
> disables FMA contraction (as opposed to -std=gnu99 or whichever -std=gnuXX
> is enabled by default, where unrestricted contraction is implicitly enabled,
> i.e. the non-standard and dangerous -ffp-contract=fast mode).

OK, that part can definitely be left out. I only added it for good
measure; I haven't actually seen the issue occur with GCC.

>
> Clang respects #pragma STDC FP_CONTRACT OFF, so that is available as
> a smaller hammer than disabling fma across the board. Breaking up
> contractable expression in fma*.c will work too.

That pragma indeed seems to work. I suppose we could just add it to
the fma*() implementations and call it a day? I'm unsure if there are
other functions that might be affected by this.

>
> [*] maybe modulo bugs in old releases where the backend doesn't respect
> -ffp-contract=off and which Glibc worked around with -mno-fused-madd.
>
> Alexander

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-28 15:28 [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c) Alex Rønne Petersen
  2024-08-28 15:53 ` Alexander Monakov
@ 2024-08-28 19:56 ` Alexander Monakov
  2024-08-29 13:36   ` Alex Rønne Petersen
  1 sibling, 1 reply; 11+ messages in thread
From: Alexander Monakov @ 2024-08-28 19:56 UTC (permalink / raw)
  To: musl; +Cc: Alex Rønne Petersen

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]


On Wed, 28 Aug 2024, Alex Rønne Petersen wrote:

> I've seen Clang do this for expressions in the fma() implementation itself,
> which of course led to infinite recursion. This happened when targeting
> arm-linux-musleabi with full soft float mode and -march=armv8-a. I imagine

FWIW I can't seem to reproduce this issue. For optionally-fused multiply-add
LLVM IR uses @llvm.fmuladd.f64, which under -mfloat-abi=soft is expanded via
__aeabi_dmul + __aeabi_dadd. I'm quite unsure how you got LLVM to generate a
call to fma in your circumstances.

Alexander

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-28 15:53 ` Alexander Monakov
  2024-08-28 16:31   ` Alex Rønne Petersen
@ 2024-08-28 20:15   ` Rich Felker
  2024-08-28 20:32     ` Alexander Monakov
  1 sibling, 1 reply; 11+ messages in thread
From: Rich Felker @ 2024-08-28 20:15 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: musl, Alex Rønne Petersen

On Wed, Aug 28, 2024 at 06:53:30PM +0300, Alexander Monakov wrote:
> 
> On Wed, 28 Aug 2024, Alex Rønne Petersen wrote:
> 
> > I've seen Clang do this for expressions in the fma() implementation itself,
> > which of course led to infinite recursion. This happened when targeting
> > arm-linux-musleabi with full soft float mode and -march=armv8-a. I imagine
> > it's possible for GCC to do similar silliness.
> 
> musl passes -std=c99 to the compiler, and in all GCC releases so far* that
> disables FMA contraction (as opposed to -std=gnu99 or whichever -std=gnuXX
> is enabled by default, where unrestricted contraction is implicitly enabled,
> i.e. the non-standard and dangerous -ffp-contract=fast mode).
> 
> Clang respects #pragma STDC FP_CONTRACT OFF, so that is available as
> a smaller hammer than disabling fma across the board. Breaking up
> contractable expression in fma*.c will work too.
> 
> [*] maybe modulo bugs in old releases where the backend doesn't respect
> -ffp-contract=off and which Glibc worked around with -mno-fused-madd.

I would rather havee it globally off, as the intent is that the code
be fully deterministic for a given ABI (excess precision being the
main thing that could differ by ABI), not randomly varying according
to how a compiler decides to optimize.

With that said, I don't see how the compiler could ever generate calls
to fma(), since we're -ffreestanding, but I think it could generate
inline fma instructions on targets where they're available, and this
is generally undesirable (see above). For gcc, I think -std=c99
already has that covered, but I'm not sure what clang does.

Rich

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-28 20:15   ` Rich Felker
@ 2024-08-28 20:32     ` Alexander Monakov
  2024-08-28 20:47       ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Monakov @ 2024-08-28 20:32 UTC (permalink / raw)
  To: musl; +Cc: Alex Rønne Petersen


On Wed, 28 Aug 2024, Rich Felker wrote:

> With that said, I don't see how the compiler could ever generate calls
> to fma(), since we're -ffreestanding, but I think it could generate
> inline fma instructions on targets where they're available, and this
> is generally undesirable (see above). For gcc, I think -std=c99
> already has that covered, but I'm not sure what clang does.

Modern Clang implements the FP_CONTRACT pragma, with default state "on".
Of course, the -ffp-contract= command-line switch is also available.

Alexander

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-28 20:32     ` Alexander Monakov
@ 2024-08-28 20:47       ` Rich Felker
  2024-08-28 21:11         ` Alexander Monakov
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2024-08-28 20:47 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: musl, Alex Rønne Petersen

On Wed, Aug 28, 2024 at 11:32:03PM +0300, Alexander Monakov wrote:
> 
> On Wed, 28 Aug 2024, Rich Felker wrote:
> 
> > With that said, I don't see how the compiler could ever generate calls
> > to fma(), since we're -ffreestanding, but I think it could generate
> > inline fma instructions on targets where they're available, and this
> > is generally undesirable (see above). For gcc, I think -std=c99
> > already has that covered, but I'm not sure what clang does.
> 
> Modern Clang implements the FP_CONTRACT pragma, with default state "on".
> Of course, the -ffp-contract= command-line switch is also available.

Even with -std=c99? In that case we should add the proposed:

tryflag CFLAGS_C99FSE -ffp-contract=off

I'm not clear on why the -mno-fused-madd would be needed. It should be
a no-op with -ffp-contract=off unless __builtin_fma were called
explicitly or something, no?

Rich

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-28 20:47       ` Rich Felker
@ 2024-08-28 21:11         ` Alexander Monakov
  2024-08-29 13:37           ` Alex Rønne Petersen
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Monakov @ 2024-08-28 21:11 UTC (permalink / raw)
  To: musl; +Cc: Alex Rønne Petersen


On Wed, 28 Aug 2024, Rich Felker wrote:

> I'm not clear on why the -mno-fused-madd would be needed. It should be
> a no-op with -ffp-contract=off unless __builtin_fma were called
> explicitly or something, no?

I think Glibc was applying it as a workaround for old GCC releases where
target backends were doing unrestricted contraction, or -ffp-contract=
option did not exist in the first place.

Alexander

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-28 19:56 ` Alexander Monakov
@ 2024-08-29 13:36   ` Alex Rønne Petersen
  2024-08-29 15:09     ` Alexander Monakov
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Rønne Petersen @ 2024-08-29 13:36 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: musl

On Wed, Aug 28, 2024 at 9:56 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Wed, 28 Aug 2024, Alex Rønne Petersen wrote:
>
> > I've seen Clang do this for expressions in the fma() implementation itself,
> > which of course led to infinite recursion. This happened when targeting
> > arm-linux-musleabi with full soft float mode and -march=armv8-a. I imagine
>
> FWIW I can't seem to reproduce this issue. For optionally-fused multiply-add
> LLVM IR uses @llvm.fmuladd.f64, which under -mfloat-abi=soft is expanded via
> __aeabi_dmul + __aeabi_dadd. I'm quite unsure how you got LLVM to generate a
> call to fma in your circumstances.

Ok, I had to do some digging to figure out what was going on here. The
TL;DR is that the issue is *mostly* specific to Zig due to the way we
model CPU features and pass them to Clang, and because of what's
likely an Arm backend bug. You *can* technically reproduce it with
vanilla Clang too, but you have to go far enough out of your way that
I don't think it happens in practice.

In `zig cc`, we pass the full set of all possible CPU features to
Clang via `-Xclang -target-feature -Xclang +/-<name>` - basically
bypassing the frontend driver. This means that when we target the
default `armv8-a` CPU, a bunch of floating point features are enabled
which the Clang driver normally explicitly disables when it sees
`-mfloat-abi=soft`. When we get to the Arm backend,
`ARMTargetLowering::isFMAFasterThanFMulAndFAdd()` does *not* check the
`use-soft-float` function attribute when deciding whether lowering to
a real FMA instruction is worthwhile, so
`SelectionDAGBuilder::visitIntrinsicCall()` decides to emit an
`ISD::FMA` node. Later, due `use-soft-float` being set,
`DAGTypeLegalizer::SoftenFloatRes_FMA()` converts the `ISD::FMA` to a
libcall.

Like was done for PowerPC, Arm's `isFMAFasterThanFMulAndFAdd()` should
probably just be changed to check for soft float.

That aside, while the motivating issue doesn't (easily) reproduce with
vanilla Clang, it's nonetheless still the case that Clang folds
multiple expressions in `fma()` into `llvm.fmuladd.*` intrinsic calls.
While this might work out in some cases, we've still basically lost at
the LLVM IR level; we're at the mercy of the target backend in regards
to whether it gets lowered to an actual FMA instruction or split back
to the ~original FMUL + FADD. And this isn't even considering what
other nonsense the optimizer pipeline might get up to before that.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-28 21:11         ` Alexander Monakov
@ 2024-08-29 13:37           ` Alex Rønne Petersen
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Rønne Petersen @ 2024-08-29 13:37 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: musl

On Wed, Aug 28, 2024 at 11:11 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Wed, 28 Aug 2024, Rich Felker wrote:
>
> > I'm not clear on why the -mno-fused-madd would be needed. It should be
> > a no-op with -ffp-contract=off unless __builtin_fma were called
> > explicitly or something, no?
>
> I think Glibc was applying it as a workaround for old GCC releases where
> target backends were doing unrestricted contraction, or -ffp-contract=
> option did not exist in the first place.

That's right. FWIW: It was added to glibc 12 years ago. I checked
current GCC sources and I can find no indication that the
`-mno-fused-madd` is necessary today; the optimizer looks to be
respecting `-ffp-contract=off`. If we don't care about ancient GCC
versions, I think it'd be completely reasonable to leave out that
flag.

>
> Alexander

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)
  2024-08-29 13:36   ` Alex Rønne Petersen
@ 2024-08-29 15:09     ` Alexander Monakov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Monakov @ 2024-08-29 15:09 UTC (permalink / raw)
  To: Alex Rønne Petersen; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 834 bytes --]


On Thu, 29 Aug 2024, Alex Rønne Petersen wrote:

> That aside, while the motivating issue doesn't (easily) reproduce with
> vanilla Clang, it's nonetheless still the case that Clang folds
> multiple expressions in `fma()` into `llvm.fmuladd.*` intrinsic calls.
> While this might work out in some cases, we've still basically lost at
> the LLVM IR level; we're at the mercy of the target backend in regards
> to whether it gets lowered to an actual FMA instruction or split back
> to the ~original FMUL + FADD. And this isn't even considering what
> other nonsense the optimizer pipeline might get up to before that.

Thank you for uncovering what was happening in LLVM! I agree there's
a backend bug, but the point is moot since disabling FMA contraction
globally is the way to go, as discussed in the longer sub-thread.

Alexander

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-08-29 15:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-28 15:28 [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c) Alex Rønne Petersen
2024-08-28 15:53 ` Alexander Monakov
2024-08-28 16:31   ` Alex Rønne Petersen
2024-08-28 20:15   ` Rich Felker
2024-08-28 20:32     ` Alexander Monakov
2024-08-28 20:47       ` Rich Felker
2024-08-28 21:11         ` Alexander Monakov
2024-08-29 13:37           ` Alex Rønne Petersen
2024-08-28 19:56 ` Alexander Monakov
2024-08-29 13:36   ` Alex Rønne Petersen
2024-08-29 15:09     ` Alexander Monakov

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).