clang-p2996

Files

Guray Ozen 8dd0d95c7c [mlir][nvgpu] Add nvgpu.tma.async.store (#77811 )

PR adds `nvgpu.tma.async.store` Op for asynchronous stores using the
Tensor Memory Access (TMA) unit.

It also implements Op lowering to NVVM dialect. The Op currently
performs asynchronous stores of a tile memory region from shared to
global memory for a single CTA.

2024-01-15 11:44:51 +01:00

CMakeLists.txt

[mlir][nvgpu] Add nvgpu.tma.async.load and nvgpu.tma.descriptor

2023-07-21 10:23:25 +02:00

NVGPUDialect.cpp

[mlir][nvgpu] Add nvgpu.tma.async.store (#77811 )

2024-01-15 11:44:51 +01:00