clang-p2996

Files

Navdeep Kumar 875eb523c1 [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops

Add warp synchronous matrix-multiply accumulate ops in GPU and NVVM
dialect. Add following three ops to GPU dialect :-
  1.) subgroup_mma_load_matrix
  2.) subgroup_mma_store_matrix
  3.) subgroup_mma_compute
Add following three ops to NVVM dialect :-
  1.) wmma.m16n16k16.load.[a,b,c].[f16,f32].row.stride
  2.) wmma.m16n16k16.store.d.[f16,f32].row.stride
  3.) wmma.m16n16k16.mma.row.row.[f16,f32].[f16,f32]

Reviewed By: bondhugula, ftynse, ThomasRaoux

Differential Revision: https://reviews.llvm.org/D95330

2021-05-06 12:06:25 +05:30

CMakeLists.txt

[mlir] make implementations of translation to LLVM IR interfaces private

2021-03-04 09:16:32 +01:00

NVVMToLLVMIRTranslation.cpp

[MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops

2021-05-06 12:06:25 +05:30