From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id 5EEA3214F9 for ; Thu, 29 Aug 2024 15:37:42 +0200 (CEST) Received: (qmail 5245 invoked by uid 550); 29 Aug 2024 13:37:38 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 5208 invoked from network); 29 Aug 2024 13:37:38 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alexrp.com; s=alexrp; t=1724938649; x=1725543449; darn=lists.openwall.com; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2ldtkosRaxGxH7apnOX6RrzgurGz1J/oR29G8aLKRNQ=; b=g01LvUpRXGC+SF8TL1Qd7ntRpourai7hlsLdaM/oiyqlv2O4anFGKGbWuxKwPvO9uW H9ijJKlWHg3XyTZB5+qTyGsZoudUOqTwIAVqol4tikYX0K2mqPsXmanC2Is/Up/TVLdZ 9QoAMAOx05vkEP5qgHJpAeyf2Nd3u+vdPgn0+YiPpWnkKvU4gw5hf3yzQsymZ9hU1LRh AGA59IGLx4rbnqCEOh9WFoWuEd7TcKbYjXG9Q7iLK0qlltkH7YadSZS23A4Z8n/DI7O0 MC8smZyKJcBaLcrpwDrZqB1miliNfKwd66SjJccvNGsPrn+nz4i3VSYHVi3f3v4uSUaK hKsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724938649; x=1725543449; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2ldtkosRaxGxH7apnOX6RrzgurGz1J/oR29G8aLKRNQ=; b=H3SHerjRkfvUegFqZMx02JIL/bpU8RZOgWa8UnsF8XTQ868nPpDEOT/FJbKUYF0XAf 2Kcko9iSTiTMBnByvTzqIBURFBPjHorwXhGMbcDlL0cDH+v8yYH4I6iPDQ0dyoTk/cUI MmY9bQQ0RZ9YYjnjaiET5s5pzpZjArYy9PM4F3O5+RJCPHaN9kPmSC4Y4H8IzlUy+rJt u/GcUm0BYMGRhAi3p22bzIZFQCxjU0Rk7S/nXfw1UbSEqcS/IRtW9EknhQUWeeuWB+x/ VI+xQJ6TvLfNV1X4YK00WUV74DyrQKVhludziWauk35Z9TPKCrer2VNrqurQ+6yyDdiC PTQA== X-Gm-Message-State: AOJu0YwKLbE6PDW/5nXQYcO5uTIxXQx4dBU8hgS1PdvCls28mZfiE1oJ 17qirHiE6R6AnSDPAjBxKToPSZeRExX2yoOkpNKOTTD0KQGMzfeQFGoKm1t5DtzXxR0rM4+tU3f xMox+nyQYMA0scp3u9ZUwfTFvCQ9Ij1D/lVIYwHqXPo/TXC25Xgs= X-Google-Smtp-Source: AGHT+IF8I3ulG9dBe/1SZttISiDR/OrLvpC/GuT2QSm21obKr3jpeBFDAUFv37nYNo7heowwdLtnFQDatQtYXbd8CIA= X-Received: by 2002:a05:600c:474c:b0:429:c674:d9de with SMTP id 5b1f17b1804b1-42bb01ad776mr28618865e9.2.1724938648334; Thu, 29 Aug 2024 06:37:28 -0700 (PDT) MIME-Version: 1.0 References: <20240828152826.826990-1-alex@alexrp.com> In-Reply-To: From: =?UTF-8?Q?Alex_R=C3=B8nne_Petersen?= Date: Thu, 29 Aug 2024 15:36:51 +0200 Message-ID: To: Alexander Monakov Cc: musl@lists.openwall.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [musl] [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c) On Wed, Aug 28, 2024 at 9:56=E2=80=AFPM Alexander Monakov wrote: > > > On Wed, 28 Aug 2024, Alex R=C3=B8nne Petersen wrote: > > > I've seen Clang do this for expressions in the fma() implementation its= elf, > > which of course led to infinite recursion. This happened when targeting > > arm-linux-musleabi with full soft float mode and -march=3Darmv8-a. I im= agine > > FWIW I can't seem to reproduce this issue. For optionally-fused multiply-= add > LLVM IR uses @llvm.fmuladd.f64, which under -mfloat-abi=3Dsoft is expande= d via > __aeabi_dmul + __aeabi_dadd. I'm quite unsure how you got LLVM to generat= e a > call to fma in your circumstances. Ok, I had to do some digging to figure out what was going on here. The TL;DR is that the issue is *mostly* specific to Zig due to the way we model CPU features and pass them to Clang, and because of what's likely an Arm backend bug. You *can* technically reproduce it with vanilla Clang too, but you have to go far enough out of your way that I don't think it happens in practice. In `zig cc`, we pass the full set of all possible CPU features to Clang via `-Xclang -target-feature -Xclang +/-` - basically bypassing the frontend driver. This means that when we target the default `armv8-a` CPU, a bunch of floating point features are enabled which the Clang driver normally explicitly disables when it sees `-mfloat-abi=3Dsoft`. When we get to the Arm backend, `ARMTargetLowering::isFMAFasterThanFMulAndFAdd()` does *not* check the `use-soft-float` function attribute when deciding whether lowering to a real FMA instruction is worthwhile, so `SelectionDAGBuilder::visitIntrinsicCall()` decides to emit an `ISD::FMA` node. Later, due `use-soft-float` being set, `DAGTypeLegalizer::SoftenFloatRes_FMA()` converts the `ISD::FMA` to a libcall. Like was done for PowerPC, Arm's `isFMAFasterThanFMulAndFAdd()` should probably just be changed to check for soft float. That aside, while the motivating issue doesn't (easily) reproduce with vanilla Clang, it's nonetheless still the case that Clang folds multiple expressions in `fma()` into `llvm.fmuladd.*` intrinsic calls. While this might work out in some cases, we've still basically lost at the LLVM IR level; we're at the mercy of the target backend in regards to whether it gets lowered to an actual FMA instruction or split back to the ~original FMUL + FADD. And this isn't even considering what other nonsense the optimizer pipeline might get up to before that.