clang-p2996

Files

Durgadoss R 13d6233e77 [MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py (#113713 )

This patch updates the syntax for nvgpu_arrive Op
in matmulBuilder.py. This fixes the compilation
error for this test.

For the warp-specialized matmul_kernel implementation,
removing the WaitGroupSyncOp (after the mma-main-loop)
fixes the hang observed.

With these two fixes, the test compiles and
executes successfully on an sm90a machine.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

2024-10-26 11:15:50 +05:30

sm90

[MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py (#113713 )

2024-10-26 11:15:50 +05:30

TensorCore

…

all-reduce-and.mlir

…

all-reduce-maxsi.mlir

…

all-reduce-minsi.mlir

…

all-reduce-op.mlir

…

all-reduce-or.mlir

…

all-reduce-region.mlir

…

all-reduce-xor.mlir

…

alloc-host-shared.mlir

[mlir][gpu] Add mlir_c_runner_utils to fix #99035

2024-07-17 09:23:32 +02:00

async.mlir

[mlir] Fix GPU integration test (part 2) (#98918 )

2024-07-15 17:39:16 +02:00

dump-ptx.mlir

[mlir][GPU][NFC] Move dump-ptx.mlir test case (#111142 )

2024-10-04 15:13:20 +02:00

dump-sass.mlir

[MLIR] Dump sass (#110227 )

2024-09-27 13:52:15 +02:00

gpu-to-cubin.mlir

…

lit.local.cfg

…

multiple-all-reduce.mlir

…

printf.mlir

…

shuffle.mlir

…

two-modules.mlir

…