clang-p2996/llvm/lib/Target/RISCV/RISCVISelLowering.cpp at cf5d8def5cf66fbdfffa00a4845bd648ec58ed60

Files

Philip Reames de34d39b66 [RISCV] Cap build vector cost to avoid quadratic cost at high LMULs

Each vslide1down operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(VL*LMUL). Since VL is a linear function of LMUL, this means the current lowering is quadradic in both LMUL and VL. To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold.

For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64). Assumes code was compiled for V (i.e. zvl128b).
buildvector_m1_via_stack.mca:Total Cycles: 1904
buildvector_m2_via_stack.mca:Total Cycles: 2104
buildvector_m4_via_stack.mca:Total Cycles: 2504
buildvector_m8_via_stack.mca:Total Cycles: 3304
buildvector_m1_via_vslide1down.mca:Total Cycles: 804
buildvector_m2_via_vslide1down.mca:Total Cycles: 1604
buildvector_m4_via_vslide1down.mca:Total Cycles: 6400
buildvector_m8_via_vslide1down.mca:Total Cycles: 25599

There are other schemes we could use to cap the cost. The next best is recursive decomposition of the vector into smaller LMULs. That's still quadratic, but with a better constant. However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme.

Arguably, this patch is fixing a regression introduced with my D149667 as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity.

Differential Revision: https://reviews.llvm.org/D159332

2023-09-05 09:03:26 -07:00

711 KiB

Raw Blame History

View Raw

711 KiB Raw Blame History

711 KiB

Raw Blame History