Files
clang-p2996/llvm/test/CodeGen
Durgadoss R c507a0830d [NVPTX] Add TMA Bulk Copy Intrinsics (#138679)
This patch adds a new variant of TMA Bulk Copy
intrinsics introduced in sm100+. This variant
has an additional byte_mask to select the bytes
for the copy operation.

* Selection is all done through table-gen now.
  So, this patch removes the corresponding
  SelectCpAsyncBulkS2G() function.
* lit tests are verified with a cuda-12.8 ptxas
  executable.

PTX Spec link:

https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-bulk-copy

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-05-15 16:08:01 +05:30
..