clang-p2996/mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp at e0ad34e56590fa2e6ffdf617e044de7eadee2139

Files

Andrea Faulds 7aa22f013e [mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851 )

This enables performing several reductions in parallel, each smaller
than the size of the subgroup.

One potential application is flash attention with subgroup-wide matrix
multiplication and reduction combined in one kernel. The multiplication
operation requires a 2D matrix to be distributed over the lanes of the
subgroup, which then constrains the shape the following reduction can
have if we want to keep data in registers.

2024-08-20 13:37:03 -04:00

26 KiB

Raw Blame History

View Raw

26 KiB Raw Blame History

26 KiB

Raw Blame History