clang-p2996

Files

Guray Ozen 18e161f9e1 [MLIR][NVVM] Introduction of the wgmma.mma_async Op

This work introduces the `wgmma.mma_async` Op along PTX generation using `BasicPtxBuilderOpInterface`. The Op is designed to execute the matrix multiply-and-accumulate operation across a warpgroup (128 threads). It's important to note that this operation works for devices with the sm_90a capability.

The matrix multiply-and-accumulate operation can take one of the following forms. In both cases, matrix D is referred to as the accumulator:
	D = A * B + D 	: Result is added to the accumulator matrix D.
	D = A * B 		: The input from the accumulator matrix D is not utilized.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D157370

2023-08-09 23:08:00 +02:00

CMakeLists.txt

…

NVVMToLLVM.cpp

[MLIR][NVVM] Introduction of the wgmma.mma_async Op

2023-08-09 23:08:00 +02:00