With EVL tail folding an AnyOf reduction will end up emitting an i1 vp.merge. Unfortunately due to RVV not containing any tail undisturbed mask instructions, an i1 vp.merge will get expanded to a lengthy sequence: ```asm vsetvli a1, zero, e64, m1, ta, ma vid.v v10 vmsltu.vx v10, v10, a0 vmand.mm v9, v9, v10 vmandn.mm v8, v8, v9 vmand.mm v9, v0, v9 vmor.mm v0, v9, v8 ``` This addresses this by matching this specific AnyOf pattern in RISCVCodegenPrepare and widening it from i1 to i8, which will end up producing a single masked i8 vor.vi inside the loop: ```llvm loop: %phi = phi <vscale x 4 x i1> [ zeroinitializer, %entry ], [ %rec, %loop ] %cmp = icmp ... %rec = call <vscale x 4 x i1> @llvm.vp.merge(%cmp, true, %phi, %evl) ``` ```llvm loop: %phi = phi <vscale x 4 x i8> [ zeroinitializer, %entry ], [ %rec, %loop ] %cmp = icmp ... %rec = call <vscale x 4 x i8> @llvm.vp.merge(%cmp, true, %phi, %evl) %trunc = trunc <vscale x 4 x i8> %rec to <vscale x 4 x i1> ``` I ended up adding this in RISCVCodegenPrepare instead of the LoopVectorizer itself since it would have required adding a target hook. It may also be possible to generalize this to other i1 vp.merges in future. Normally the trunc will be sunk outside of the loop. But it also doesn't check to see if all the non-phi users of the vp.merge are outside of the loop: If there are in-loop users this still seems to be profitable, see the test diff in `@widen_anyof_rdx_use_in_loop` Fixes #132180
9.9 KiB
9.9 KiB