Files
clang-p2996/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
Luke Lau 185ba025da [RISCV] Widen i1 AnyOf reductions (#134898)
With EVL tail folding an AnyOf reduction will end up emitting an i1
vp.merge.

Unfortunately due to RVV not containing any tail undisturbed mask
instructions, an i1 vp.merge will get expanded to a lengthy sequence:

```asm
  vsetvli a1, zero, e64, m1, ta, ma
  vid.v v10                        
  vmsltu.vx v10, v10, a0           
  vmand.mm v9, v9, v10             
  vmandn.mm v8, v8, v9             
  vmand.mm v9, v0, v9              
  vmor.mm v0, v9, v8               
```

This addresses this by matching this specific AnyOf pattern in
RISCVCodegenPrepare and widening it from i1 to i8, which will end up
producing a single masked i8 vor.vi inside the loop:

```llvm
loop:                                                                      
  %phi = phi <vscale x 4 x i1> [ zeroinitializer, %entry ], [ %rec, %loop ]
  %cmp = icmp ...                                                                                          
  %rec = call <vscale x 4 x i1> @llvm.vp.merge(%cmp, true, %phi, %evl)     
```

```llvm
loop:                                                                      
  %phi = phi <vscale x 4 x i8> [ zeroinitializer, %entry ], [ %rec, %loop ]
  %cmp = icmp ...                             
  %rec = call <vscale x 4 x i8> @llvm.vp.merge(%cmp, true, %phi, %evl)     
  %trunc = trunc <vscale x 4 x i8> %rec to <vscale x 4 x i1>               
```

I ended up adding this in RISCVCodegenPrepare instead of the
LoopVectorizer itself since it would have required adding a target hook.

It may also be possible to generalize this to other i1 vp.merges in
future.

Normally the trunc will be sunk outside of the loop. But it also doesn't
check to see if all the non-phi users of the vp.merge are outside of the
loop: If there are in-loop users this still seems to be profitable, see
the test diff in `@widen_anyof_rdx_use_in_loop`

Fixes #132180
2025-04-28 10:07:51 +08:00

9.9 KiB