Files
clang-p2996/mlir/lib/Conversion/GPUToSPIRV
Andrea Faulds 7aa22f013e [mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851)
This enables performing several reductions in parallel, each smaller
than the size of the subgroup.

One potential application is flash attention with subgroup-wide matrix
multiplication and reduction combined in one kernel. The multiplication
operation requires a 2D matrix to be distributed over the lanes of the
subgroup, which then constrains the shape the following reduction can
have if we want to keep data in registers.
2024-08-20 13:37:03 -04:00
..