Files
clang-p2996/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
Christian Sigg 7851b1bcf1 [mlir][gpu] Change GPU modules to globals (#135478)
Load/unload GPU modules in global ctors/dtors instead of each time when
launching a kernel.

Loading GPU modules is a heavy-weight operation and synchronizes the GPU
context. Now that the modules are loaded ahead of time, asynchronously
launched kernels can run concurrently, see
https://discourse.llvm.org/t/how-to-lower-the-combination-of-async-gpu-ops-in-gpu-dialect.

The implementations of `embedBinary()` and `launchKernel()` use slightly
different mechanics at the moment but I prefer to not change the latter
more than necessary as part of this PR. I will prepare a follow-up NFC
for `launchKernel()` to align them again.
2025-04-22 13:49:58 +02:00

45 KiB