[DAG] Prefer 0.0 over -0.0 as neutral value for FADD w/NoSignedZero (#106616)

When getting a neutral value, we can prefer using a positive zero over a negative zero if nsz is set on the FADD (or reduction). A positive zero should be cheaper to materialize on basically all targets. Arguably, we should be doing this kind of canonicalization in DAGCombine, but we don't do that for any of the other reduction variants, so this seems like path of least resistance. This does mean that we can only do this for "fast" reductions. Just nsz isn't enough, as that goes through the SEQ_FADD path where the IR level start value isn't folded away. If folks think this is to RISCV specific, let me know. There's a trivial RISCV specific implementation. I went with the generic one as I through this might benefit other targets.
2024-08-30 07:56:14 -07:00
parent a3816b5a57
commit 924907bc6a
2 changed files with 4 additions and 3 deletions
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -13267,7 +13267,9 @@ SDValue SelectionDAG::getNeutralElement(unsigned Opcode, const SDLoc &DL,
  case ISD::SMIN:
    return getConstant(APInt::getSignedMaxValue(VT.getSizeInBits()), DL, VT);
  case ISD::FADD:
-    return getConstantFP(-0.0, DL, VT);
+    // If flags allow, prefer positive zero single it's generally cheaper
+    // to materialize on most targets.
+    return getConstantFP(Flags.hasNoSignedZeros() ? 0.0 : -0.0, DL, VT);
  case ISD::FMUL:
    return getConstantFP(1.0, DL, VT);
  case ISD::FMINNUM:
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
@@ -524,8 +524,7 @@ define float @vreduce_fadd_v7f32_neutralstart_fast(ptr %x) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 7, e32, m2, ta, ma
 ; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    lui a0, 524288
-; CHECK-NEXT:    vmv.s.x v10, a0
+; CHECK-NEXT:    vmv.s.x v10, zero
 ; CHECK-NEXT:    vfredusum.vs v8, v8, v10
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret