Files
clang-p2996/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
Shilei Tian 0b50d095bc [AMDGPU] Don't optimize agpr phis if the operand doesn't have subreg use (#91267)
If the operand doesn't have any subreg use, the optimization could
potentially
generate `V_ACCVGPR_READ_B32_e64` with wrong register class. The
following example demonstrates the issue.

Input MIR:

```
bb.0:
  %0:sgpr_32 = S_MOV_B32 0
  %1:sgpr_128 = REG_SEQUENCE %0:sgpr_32, %subreg.sub0, %0:sgpr_32, %subreg.sub1, %0:sgpr_32, %subreg.sub2, %0:sgpr_32, %subreg.sub3
  %2:vreg_128 = COPY %1:sgpr_128
  %3:areg_128 = COPY %2:vreg_128, implicit $exec

bb.1:
  %4:areg_128 = PHI %3:areg_128, %bb.0, %6:areg_128, %bb.1
  %5:areg_128 = PHI %3:areg_128, %bb.0, %7:areg_128, %bb.1
  ...
```

Output of current implementation:

```
bb.0:
  %0:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
  %1:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
  %2:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
  %3:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
  %4:areg_128 = REG_SEQUENCE %0:agpr_32, %subreg.sub0, %1:agpr_32, %subreg.sub1, %2:agpr_32, %subreg.sub2, %3:agpr_32, %subreg.sub3
  %5:vreg_128 = V_ACCVGPR_READ_B32_e64 %4:areg_128, implicit $exec
  %6:areg_128 = COPY %46:vreg_128

bb.1:
  %7:areg_128 = PHI %6:areg_128, %bb.0, %9:areg_128, %bb.1
  %8:areg_128 = PHI %6:areg_128, %bb.0, %10:areg_128, %bb.1
  ...
```

The problem is the generated `V_ACCVGPR_READ_B32_e64` instruction.
Apparently the operand `%4:areg_128` is not valid for this.

In this patch, we don't count the none-subreg use because
`V_ACCVGPR_READ_B32_e64` can't handle none-32-bit operand.

Fixes: SWDEV-459556
2024-05-07 16:44:00 -04:00

75 KiB