clang-p2996

Files

Andrea Faulds 7aa22f013e [mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851 )

This enables performing several reductions in parallel, each smaller
than the size of the subgroup.

One potential application is flash attention with subgroup-wide matrix
multiplication and reduction combined in one kernel. The multiplication
operation requires a 2D matrix to be distributed over the lanes of the
subgroup, which then constrains the shape the following reduction can
have if we want to keep data in registers.

2024-08-20 13:37:03 -04:00

CMakeLists.txt

[mlir][spirv] Add integration tests for vector.interleave and vector.shuffle (#93858 )

2024-06-03 10:12:39 -04:00

GPUToSPIRV.cpp

[mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851 )

2024-08-20 13:37:03 -04:00

GPUToSPIRVPass.cpp

[mlir][spirv] Add integration tests for vector.interleave and vector.shuffle (#93858 )

2024-06-03 10:12:39 -04:00

WmmaOpsToSPIRV.cpp

[mlir][spirv] Drop support for SPV_NV_cooperative_matrix (#76782 )

2024-01-08 17:57:52 -05:00